Suitland
Digital Storage And Memory Projections For 2023, Part 3
This is my third and last blog on digital storage and memory projections for 2023. The last two articles focused on digital storage and memory devices including magnetic tape, HDDs, SSDs as well as NAND, DRAM and emerging memories. We also covered developments in shared storage and memory networking. This article focuses on developments in digital storage systems and software and their use in various workflows. While there are lingering issues with supply chains and at least partial remote work and remote collaboration seems here to stay, in 2022, we began to recover from the impacts of two years of the COVID pandemic. On the other hand, high inflation rates and tightening of money supplies to try and stem inflation resulted in many technology-driven companies tightening their belts, laying off workers and moderating their IT infrastructure spending in the second half of the year.
- North America > United States > California > San Francisco County > San Francisco (0.15)
- North America > United States > Maryland > Prince George's County > Suitland (0.05)
- North America > United States > Colorado > Denver County > Denver (0.05)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Information Technology (1.00)
- Banking & Finance > Economy (0.89)
- Health & Medicine > Therapeutic Area (0.75)
- Information Technology > Cloud Computing (1.00)
- Information Technology > Artificial Intelligence (1.00)
- Information Technology > Communications > Collaboration (0.55)
- Asia > India (0.04)
- North America > United States > Utah > Salt Lake County > Salt Lake City (0.04)
- North America > United States > Maryland > Prince George's County > Suitland (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- Information Technology > Services (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Therapeutic Area (0.69)
- (2 more...)
Regularization for Shuffled Data Problems via Exponential Family Priors on the Permutation Group
Wang, Zhenbang, Ben-David, Emanuel, Slawski, Martin
In the analysis of data sets consisting of (X, Y)-pairs, a tacit assumption is that each pair corresponds to the same observation unit. If, however, such pairs are obtained via record linkage of two files, this assumption can be violated as a result of mismatch error rooting, for example, in the lack of reliable identifiers in the two files. Recently, there has been a surge of interest in this setting under the term "Shuffled data" in which the underlying correct pairing of (X, Y)-pairs is represented via an unknown index permutation. Explicit modeling of the permutation tends to be associated with substantial overfitting, prompting the need for suitable methods of regularization. In this paper, we propose a flexible exponential family prior on the permutation group for this purpose that can be used to integrate various structures such as sparse and locally constrained shuffling. This prior turns out to be conjugate for canonical shuffled data problems in which the likelihood conditional on a fixed permutation can be expressed as product over the corresponding (X,Y)-pairs. Inference is based on the EM algorithm in which the intractable E-step is approximated by the Fisher-Yates algorithm. The M-step is shown to admit a significant reduction from $n^2$ to $n$ terms if the likelihood of (X,Y)-pairs has exponential family form as in the case of generalized linear models. Comparisons on synthetic and real data show that the proposed approach compares favorably to competing methods.
- Asia > Mongolia (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (8 more...)
A Two-Stage Approach to Multivariate Linear Regression with Sparsely Mismatched Data
Slawski, Martin, Ben-David, Emanuel, Li, Ping
A tacit assumption in linear regression is that (response, predictor)-pairs correspond to identical observational units. A series of recent works have studied scenarios in which this assumption is violated under terms such as ``Unlabeled Sensing and ``Regression with Unknown Permutation''. In this paper, we study the setup of multiple response variables and a notion of mismatches that generalizes permutations in order to allow for missing matches as well as for one-to-many matches. A two-stage method is proposed under the assumption that most pairs are correctly matched. In the first stage, the regression parameter is estimated by handling mismatches as contaminations, and subsequently the generalized permutation is estimated by a basic variant of matching. The approach is both computationally convenient and equipped with favorable statistical guarantees. Specifically, it is shown that the conditions for permutation recovery become considerably less stringent as the number of responses $m$ per observation increase. Particularly, for $m = \Omega(\log n)$, the required signal-to-noise ratio does no longer depend on the sample size $n$. Numerical results on synthetic and real data are presented to support the main findings of our analysis.
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- North America > United States > Nevada > Clark County > Las Vegas (0.04)
- (9 more...)
Tight Semi-Nonnegative Matrix Factorization
The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization. A multi-objective optimization problem finds conical combinations of templates that approximate a given data matrix. The templates are chosen so that as far as possible only the initial data set can be represented this way. However, the templates are not required to be nonnegative nor convex combinations of the original data.
- North America > United States > Maryland > Prince George's County > Suitland (0.04)
- North America > United States > Colorado > Larimer County > Fort Collins (0.04)
- Africa > Senegal > Kolda Region > Kolda (0.04)
Linear Regression with Sparsely Permuted Data
Slawski, Martin, Ben-David, Emanuel
In regression analysis of multivariate data, it is tacitly assumed that response and predictor variables in each observed response-predictor pair correspond to the same entity or unit. In this paper, we consider the situation of "permuted data" in which this basic correspondence has been lost. Several recent papers have considered this situation without further assumptions on the underlying permutation. In applications, the latter is often to known to have additional structure that can be leveraged. Specifically, we herein consider the common scenario of "sparsely permuted data" in which only a small fraction of the data is affected by a mismatch between response and predictors. However, an adverse effect already observed for sparsely permuted data is that the least squares estimator as well as other estimators not accounting for such partial mismatch are inconsistent. One approach studied in detail herein is to treat permuted data as outliers which motivates the use of robust regression formulations to estimate the regression parameter. The resulting estimate can subsequently be used to recover the permutation. A notable benefit of the proposed approach is its computational simplicity given the general lack of procedures for the above problem that are both statistically sound and computationally appealing.
- North America > United States > Virginia > Fairfax County > Fairfax (0.04)
- North America > United States > Michigan (0.04)
- North America > United States > Maryland > Prince George's County > Suitland (0.04)
- (2 more...)
A Quasi-isometric Embedding Algorithm
The Whitney embedding theorem gives an upper bound on the smallest embedding dimension of a manifold. If a data set lies on a manifold, a random projection into this reduced dimension will retain the manifold structure. Here we present an algorithm to find a projection that distorts the data as little as possible.
- North America > United States > New York (0.04)
- North America > United States > Maryland > Prince George's County > Suitland (0.04)
- North America > United States > Colorado > Larimer County > Fort Collins (0.04)